Goto

Collaborating Authors

 diagonal line




GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

Wu, Zixuan, Kim, Yoolim, Anderson, Carolyn Jane

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) building upon the foundation of powerful large language models have made rapid progress in reasoning across visual and textual data. While VLMs perform well on vision tasks that they are trained on, our results highlight key challenges in abstract pattern recognition. We present GlyphPattern, a 954 item dataset that pairs 318 human-written descriptions of visual patterns from 40 writing systems with three visual presentation styles. GlyphPattern evaluates abstract pattern recognition in VLMs, requiring models to understand and judge natural language descriptions of visual patterns. GlyphPattern patterns are drawn from a large-scale cognitive science investigation of human writing systems; as a result, they are rich in spatial reference and compositionality. Our experiments show that GlyphPattern is challenging for state-of-the-art VLMs (GPT-4o achieves only 55% accuracy), with marginal gains from few-shot prompting. Our detailed error analysis reveals challenges at multiple levels, including visual processing, natural language understanding, and pattern generalization.


CONFINE: Conformal Prediction for Interpretable Neural Networks

Huang, Linhui, Lala, Sayeri, Jha, Niraj K.

arXiv.org Machine Learning

Deep neural networks exhibit remarkable performance, yet their black-box nature limits their utility in fields like healthcare where interpretability is crucial. Existing explainability approaches often sacrifice accuracy and lack quantifiable measures of prediction uncertainty. In this study, we introduce Conformal Prediction for Interpretable Neural Networks (CONFINE), a versatile framework that generates prediction sets with statistically robust uncertainty estimates instead of point predictions to enhance model transparency and reliability. CONFINE not only provides example-based explanations and confidence estimates for individual predictions but also boosts accuracy by up to 3.6%. We define a new metric, correct efficiency, to evaluate the fraction of prediction sets that contain precisely the correct label and show that CONFINE achieves correct efficiency of up to 3.3% higher than the original accuracy, matching or exceeding prior methods. CONFINE's marginal and class-conditional coverages attest to its validity across tasks spanning medical image classification to language understanding. Being adaptable to any pre-trained classifier, CONFINE marks a significant advance towards transparent and trustworthy deep learning applications in critical domains.


How Far Are We from Intelligent Visual Deductive Reasoning?

Zhang, Yizhe, Bai, He, Zhang, Ruixiang, Gu, Jiatao, Zhai, Shuangfei, Susskind, Josh, Jaitly, Navdeep

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relational and deductive reasoning relying solely on visual clues. We perform comprehensive evaluations of several popular VLMs employing standard strategies such as in-context learning, self-consistency, and Chain-of-thoughts (CoT) on three diverse datasets, including the Mensa IQ test, IntelligenceTest, and RAVEN. The results reveal that despite the impressive capabilities of LLMs in text-based reasoning, we are still far from achieving comparable proficiency in visual deductive reasoning. We found that certain standard strategies that are effective when applied to LLMs do not seamlessly translate to the challenges presented by visual reasoning tasks. Moreover, a detailed analysis reveals that VLMs struggle to solve these tasks mainly because they are unable to perceive and comprehend multiple, confounding abstract patterns in RPM examples.


Forecasting VIX using Bayesian Deep Learning

Hortúa, Héctor J., Mora-Valencia, Andrés

arXiv.org Artificial Intelligence

Investors and regulators are concerned about financial market volatility and crashes. For this reason, the Volatility index (VIX) was introduced in 1993 by the Chicago Board Options Exchange (CBOE) with the aim of assessing the expected financial market volatility in the short-run, i.e. for the next 30 days, since it is calculated as an implied volatility from the options on the S&P 500 index on this time-to-maturity [1]. The VIX has been proven to be a good predictor of expected stock index shifts, and therefore as an early warning for investor sentiment and financial market turbulences (see e.g., [1], and more recently, [2]). Due to its importance for asset managers and regulators, it would be useful to foresee the values of the index; however, the VIX is very difficult to forecast [3]. There exist several proposals to predict time series found in the literature classified as conventional and modern methods (see e.g., [4] and the references therein).


Must-know Machine Learning Questions – Logistic Regression

#artificialintelligence

Looking for Machine Learning Interview Questions & Answers to prepare? We have an ultimate guide of knowledge-based Machine Learning Interview Questions and Answers.


Expanding Holographic Embeddings for Knowledge Completion

Xue, Yexiang, Yuan, Yang, Xu, Zhitian, Sabharwal, Ashish

Neural Information Processing Systems

Neural models operating over structured spaces such as knowledge graphs require a continuous embedding of the discrete elements of this space (such as entities) as well as the relationships between them. Relational embeddings with high expressivity, however, have high model complexity, making them computationally difficult to train. We propose a new family of embeddings for knowledge graphs that interpolate between a method with high model complexity and one, namely Holographic embeddings (HolE), with low dimensionality and high training efficiency. This interpolation, termed HolEx, is achieved by concatenating several linearly perturbed copies of original HolE. We formally characterize the number of perturbed copies needed to provably recover the full entity-entity or entity-relation interaction matrix, leveraging ideas from Haar wavelets and compressed sensing. In practice, using just a handful of Haar-based or random perturbation vectors results in a much stronger knowledge completion system. On the Freebase FB15K dataset, HolEx outperforms originally reported HolE by 14.7\% on the HITS@10 metric, and the current path-based state-of-the-art method, PTransE, by 4\% (absolute).


Expanding Holographic Embeddings for Knowledge Completion

Xue, Yexiang, Yuan, Yang, Xu, Zhitian, Sabharwal, Ashish

Neural Information Processing Systems

Neural models operating over structured spaces such as knowledge graphs require a continuous embedding of the discrete elements of this space (such as entities) as well as the relationships between them. Relational embeddings with high expressivity, however, have high model complexity, making them computationally difficult to train. We propose a new family of embeddings for knowledge graphs that interpolate between a method with high model complexity and one, namely Holographic embeddings (HolE), with low dimensionality and high training efficiency. This interpolation, termed HolEx, is achieved by concatenating several linearly perturbed copies of original HolE. We formally characterize the number of perturbed copies needed to provably recover the full entity-entity or entity-relation interaction matrix, leveraging ideas from Haar wavelets and compressed sensing. In practice, using just a handful of Haar-based or random perturbation vectors results in a much stronger knowledge completion system. On the Freebase FB15K dataset, HolEx outperforms originally reported HolE by 14.7\% on the HITS@10 metric, and the current path-based state-of-the-art method, PTransE, by 4\% (absolute).


What is Deep Learning and How Does It Work? – Robotic Vision Resources Hub

#artificialintelligence

Now you're thinking: Welcome to the world of machine learning and deep-neural networks Facebook automatically finds and tags friends in your photos. Google Deepmind's AlphaGo computer program trounced champions at the ancient game of Go last year. Skype translates spoken conversations in real time – and pretty accurately too. Behind all this is a type of artificial intelligence called deep learning. But what is deep learning and how does it work?